On generating near-optimal tableaux for conditional functional dependencies
نویسندگان
چکیده
Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs. This paper is the first to formally characterize a “good” pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an “on-demand” algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.
منابع مشابه
A Unified Hierarchy for Functional Dependencies, Conditional Functional Dependencies and Association Rules
Conditional Functional Dependencies (CFDs) are Functional Dependencies (FDs) that hold on a fragment relation of the original relation. In this paper, we show the hierarchy between FDs, CFDs and Association Rules (ARs): FDs are the union of CFDs while CFDs are the union of ARs. We also show the link between Approximate Functional Dependencies (AFDs) and approximate ARs. In this paper, we show t...
متن کاملTesting Implication of Probabilistic Dependencies
Axiomatization has been widely used for test ing logical implications. This paper suggests a non-axiomatic method, the chase, to test if a new dependency follows from a given set of probabilistic dependencies. Although the chase computation may require exponential time in some cases, this technique is a pow erful tool for establishing nontrivial theoreti cal results. More importantly, this a...
متن کاملComparison of Conditional Functional Dependencies using Fast CFD and CTANE Algorithms
Conditional Functional Dependencies (CFDs) are an extension of Functional Dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we take 4 techniques for cleaning the data from sample relati...
متن کاملThe Theory of Functional and Subset Dependencies Over Relational Expressions
A formal system for reasoning about functional dependencies (FDs) and subset dependencies (SDS) defined over relational expressions is described. An FD e: X +Y indicates that Y is functionally dependent on X in the relation denoted by expression e; an SD e c f indicates that the relation denoted by e is a subset of that denoted by f. The system is shown to be sound and complete by resorting to ...
متن کاملAutomated Reasoning to Infer all Minimal Keys
Wastl introduced for first time a tableaux-like method based on an inference system for deriving all minimal keys from a relational schema. He introduced two inference rules and built an automated method over them. In this work we tackle the key finding problem with a tableaux method, but we will use two inference rules inspired by the Simplification Logic for Functional Dependencies. Wastl’s m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 1 شماره
صفحات -
تاریخ انتشار 2008